Convergence of simulated annealing 
by the generalized transition probability 

Hidetoshi Nishimori and Jun-ichi Inoue 

Department of Physics, Tokyo Institute of Technology 
Oh-okayama, Meguro-ku, Tokyo 152, Japan 

February 1, 2008 
Abstract 

We prove weak ergodicity of the inhomogeneous Markov process generated by the generalized 
transition probability of Tsallis and Stariolo under power-law decay of the temperature. We thus 
have a mathematical foundation to conjecture convergence of simulated annealing processes with 
the generalized transition probability to the minimum of the cost function. An explicitly solvable 
example in one dimension is analyzed in which the generalized transition probability leads to a fast 
convergence of the cost function to the optimal value. We also investigate how far our arguments 
depend upon the specific form of the generalized transition probability proposed by Tsallis and 
Stariolo. It is shown that a few requirements on analyticity of the transition probability are 
sufficient to assure fast convergence in the case of the solvable model in one dimension. 

1 Introduction 

Simulated annealing has been a powerful tool for combinatorial optimization problems [jl], ||, Q. To 
find the minimum of a cost function, one introduces a stochastic process similar to Monte Carlo 
simulations in statistical mechanics with a control parameter corresponding to the temperature to 
allow the system to escape from local minima. By gradually decreasing the temperature one searches 
for increasingly narrower regions in the phase space closer to the optimal state, eventually reaching 
the optimal state itself in the infinite-time limit. 

A very important factor in such processes is the annealing schedule, or the rate of decrease of 
temperature. If one lowers the temperature too quickly, the system may end up in one of the local 
minima. On the other hand, a very slow decrease of temperature would surely bring the system to the 
true minimum. However, such a slow process is not practically useful. One therefore has to determine 
carefully how fast to decrease the temperature in simulated annealings. On this problem, Geman 
and Geman Q proved that the decrease of temperature as T = const/ log t, with the proportionality 
constant roughly of the order of the system size, guarantees convergence to the optimal state for a wide 
class of combinatorial optimization problems. This inverse- log law is still too slow for most practical 
purposes. Nevertheless this result serves as a mathematical background of empirical investigations by 
numerical methods. 

There have been a few proposals to accelerate the annealing schedule by modifying the transi- 
tion probabilities used in the conventional simulated annealing. Szu and Hartly [||] pointed out for 
a problem defined in a continuum space that occasional non-local samplings significantly improve 
the performance, leading to an annealing schedule inversely proportional to time T = const/t. This 
non-local sampling corresponds to modification of the generation probability (or, more precisely, the 
neighbourhood) to be defined later. Tsallis and Stariolo || proposed to modify the acceptance proba- 
bility generalizing the usual Boltzmann form in addition to the generation probability (which they call 
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the visiting distribution). Numerical investigations show faster convergence to the optimal state by 
annealing processes using their generalized transition probability or its modifications ||, [7|, ||] . Szu and 
Hartley and Tsallis and Stariolo proved that the modified generation probability assures convergence 
to the optimal state under a power-low decrease of the temperature as a function of time. How- 
ever there has been no mathematically rigorous argument on the convergence under the generalized 
acceptance probability of Tsallis and Stariolo. 

We prove in the present paper that the inhomogeneous Markov process generated by the general- 
ized transition (acceptance) probability of Tsallis and Stariolo satisfies the property of weak ergodicity 
under an annealing schedule inversely proportional to the power of time. Rigorously speaking, weak 
ergodicity (which roughly means asymptotic independence of the probability distribution from the 
initial condition) itself does not immediately guarantee the convergence to the optimal state. Never- 
theless our result is expected to be close enough to this final goal because the probability distribution 
would depend upon the initial condition if the annealing schedule is not appropriately chosen. 

Various definitions are given in the next section. The proof of our main theorem appears in section 
3. An example of fast convergence by the generalized transition probability is discussed in section 4 
for a parameter range not covered by the theorem in section 3. In section 5 we investigate if we may 
further generalize the transition probability in the case of the simple model discussed in section 4. 
The final section is devoted to discussions on the significance of our result. 



2 Inhomogeneous Markov chain 

Let us first list up various definitions to fix notations. We consider a problem of combinatorial 
optimization with the space of states denoted by S. The size of S is finite. The cost function E is 
a real single- valued function on <S. The goal of a combinatorial optimization problem is to find the 
minimum (or minima) of the cost function. For this purpose we introduce the process of simulated 
annealing using the Markov chain generated by the transition probability from state x(E S) to state 
y(& S) at time step t: 

r(T „. f ) - S p ( x > y) A & y; r (*)) ( x + y) m 

y ' j " 1 1 - E,(*0 P(x, z)A(x, z; T(t)) (x = y) ' W 
where P(x,y) is the generation probability 

P(x v) { > ° {y£ Sx) (2) 
[X,V) \ =0 (otherwise) [ ' 

with S x the neighbourhood of x (the set of states that can be reached by a single step from x), 
and A(x, y; T) is the acceptance probability. In the case of the generalized transition probability, the 
acceptance probability is given as || 

A{x,y;T) = mm{l,u(x, y; T)} 

( , s E(y) - E(x) \ 1/{1 - q) 
u(x,y;T) = 1 + (q - 1) Ky > K J ) (3) 



where q is a real parameter. For technical reasons we have to restrict ourselves to the region q > 1 in 
this and the next sections. This acceptance probability reduces to the usual Boltzmann form in the 
limit q — > 1. The present Markov chain is inhomogeneous, i.e., the transition probability (|l|) depends 
on the time step t through time dependence of T(t). 

We choose the annealing schedule, or the t-dependence of the parameter T (the temperature), as 

T ^ = JTTW ( & ' c>0 ' t = ' 1 ' 2 '---)- W 
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To analyze the inhomogeneous Markov chain generated by the above transition probability we 
introduce the transition matrix G(t) with the element 

[G(t)] Xjy = G(x,y;t). (5) 

Let us write the set of probability distributions on S as V. A probability distribution p(G V) may 
be regarded as a vector with the component \p] x = p(x)(x G S). Using this matrix- vector notation, 
the probability distribution at time step t, starting from an initial distribution po(€ V) at time s, is 
written as 

P (s, t) = Po G(s, t) = Po G(s)G(s + 1) • • • G(t - 1). (6) 
The coefficient of ergodicity is defined as 

a(G) = 1 — minjy^ min{G(x, z), G(y, z)}\x, y G S}. (7) 

We shall prove in the next section the property of weak ergodicity for the present Markov chain, 
which means that the probability distribution function after sufficiently long time becomes independent 
of the initial condition: 

Vs > : lim sup{||pi(s,t) -p 2 (s,t)|| I Poi,P02 G V} = (8) 
where pi(s, t) and P2(s, t) are the probability distributions with different initial conditions poi an d Po2- 

pi(s,t) = p iG(s,t) (9) 

P2{s,t) = p 02 G(s,t). (10) 

The norm is defined by 

\\Pi - P2W = J2\pi(x) -p 2 {x)\. (11) 

Although we focus our attention on weak ergodicity in the present paper, it may be useful as a reference 
to recall the definition of strong ergodicity: 

3r G P,Vs > : lim sup{||p(s,t) - r\\ | p G V} = 0. (12) 

t— >oo 

The following theorems give criteria for weak and strong ergodicity (2|, |3| : 

Theorem 1 (Condition for weak ergodicity) An inhomogeneous Markov chain is weakly ergodic 
if and only if there exists a strictly increasing sequence of positive numbers 

t <h < ■■■ <U < U+i <■■■ 

such that 

£(l-a(G?(* i ,t i+ i))) = oo. (13) 

i=0 

Theorem 2 (Condition for strong ergodicity) An inhomogeneous Markov chain is strongly er- 
godic if it satisfies the following conditions: 

1. it is weakly ergodic 

2. there exists pt G P(Vt > 0) such that pt = ptG(t) 

3. pt satisfies 

00 

\\Pt -Pt+l\\ < oo- ( 14 ) 

t=0 
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3 Weak ergodicity 

We prove in the present section that the condition in Theorem [l] is satisfied by the present inho- 
mogeneous Markov chain generated by the generalized transition probability. The argument closely 
follows that for the conventional Boltzmann-type transition probability [Q, [2], || . We need the following 
Lemma for this purpose. 

Lemma 1 (Lower bound on the transition probability) The elements of the transition matrix 
satisfy the following bounds. For off-diagonal elements, 

( (a - l)L\ VC 1- ?) 
P(x, y)>0^Vt>0: G{x, y;t) >w\l + w ' j , (15) 

and for diagonal elements, 

Vx £ S - S m ,3h > 0,Vt >h : G(x,x;t) > w (l + (g ) ^ * (16) 

where S m is the set of locally maximum states 

S m = {x\xeS,VyeS x :E(y)<E{x)}, (17) 
L denotes the maximum change of the cost function by a single step 

L = m&x{\E(x) - E(y)\ \ P(x,y) > 0} (18) 
and w is the minimum value of P(x, y) 

w = min{P(x,y) \ P(x,y) > 0, x,y £ S}. (19) 

Proof. 

First we prove (15). When E(y) — E(x) > 0, we have u(x,y;T(t)) < 1 and thus 

G(x,y;t) = P(x,y)A(x,y;T(t)) 

> w mm{l,u(x,y;T(t))} 
= w u(x,y;T(t)) 

> H 1+1 %r) • (20) 

If E(y) — E{x) < 0, u(x,y;T(t)) > 1 and therefore 

G(x,y;t) > w mm{l,u(x,y;T(t))} 
= w 

> H 1+i %r) • (21) 

We next prove (|l6|) . Since x £ <S — <S m , there exists a state y £ S x satisfying E(y) — E(x) > 0. For 
such a state y, 

\imu(x,y;T(t)) = (22) 

t— >oo 

and consequently 



lim min{l, u(x, y; T(t))} = 0. (23) 

t^oo 
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Then min{l, u(x, y; T(t))} can be made arbitrarily small for sufficiently large t. More precisely, there 
exist t\ > and < e < 1 such that 

Vi > tx : min{l, u(x, y; T(t))} < e. (24) 

We therefore have 

**TP(x,z)A(x,z;T(t)) = P(x,y)min{l,u(x,y;T(t))}+ Y P(x, z)mm{l,u(x, z;T(t))} 

zGS zeS-{y} 

< P{x,y)e + Y p (x,z) 
zes-{ y } 

= -(l-e)P(x,y) + l. (25) 
The diagonal element of (|]) thus satisfies 

G(x,x;t) > (l-e)P(x,y) 

>- ^wr* 

where we have used that the last factor can be made arbitrarily small for sufficiently large t. | 

We use the following notations in the proof of weak ergodicity. The minimum number of state 
transitions to reach y from x (or vice versa) is written as d(x,y). One can then reach any state from 
x within k(x) steps: 

k(x) = max{d(x, y)\y € S}. (27) 
The minimum of k(x) for x G S — S m is denoted as R, and the state giving this minimum value is x*: 

R = mm{k{x)\x € S - S m } (28) 
x* = argmin{A;(x)|x € S — S m }. (29) 



Theorem 3 (Weak ergodicity) The inhomogeneous Markov chain defined in section 2 is weakly 
ergodic ifO<c< (q — 1)/R. 

Proof. 

Consider a transition from state x to x*. According to the definition (0) of the double-time 
transition matrix, we have 

G(x,x*;t-R,t)= G(x,x 1 ;t-R)G(x 1 ,x 2 \t-R+l)---G{x R - 1 ,x*;t-l). (30) 

From the definitions of x* and R, there exists at least one sequence of transitions to reach x* from x 
within R steps such that 

X / Xl / x 2 / • • • ^ X k = X k+ l ■■■ = X R = X*. (31) 

If we keep only such a sequence in the summation of ( |30|) and use Lemma 1, 



G{x,x*;t - R,t) > G(x,xi;t - R)G{xi,x 2 ;t - R+ I) ■ ■ ■ G(x R - 1 ,x R ;t - 1) 
A / r„_-nr. \V(H) 



a n»(' ^ (t -V t - i) 



k=l 



T{t -l)J 



R/(l-q) 



(32) 
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> ,„ R pfa-i)^ + ny /( '-'\ (34) 



Then the coefficient of ergodicity satisfies 

a(G(t-R,t)) = 1 - min{53 mm{G(x,z;t - R,t),G(y, z;t - R,t)}\x,y € S} 
< 1 — min{min{G(x, x*;t — R, t), G(y, x*;t — R, t)}\x, y G S} 

<- W' 

We now use the annealing schedule @ . There exists a non- negative integer kg such that the following 
inequalities hold for all k > k$: 

i-MP(kR-R,m > ^( 1 + ( "- 1) y +1> ° NH/ """ 

fc 4- 

It is clear from (53) that the summation 

OO fco — 1 OO 

5^(1 - a(G(fc.R - 22, kR))) = 53 (1 - a(G(A;ii - 22, fcR))) + 53 (1 - a(G(A;22 - 22, feR))) (35) 
fc=0 fc=0 fc=fc 

diverges if c satisfies < c < (q — l)/22. This proves weak ergodicity according to Theorem ||. | 

Remark 1. The arguments developed in sections 2 and 3 break down for q < 1. For instance, the 
argument of the outer parentheses on the right hand side of @ becomes negative for sufficiently 
small T if E{y) — E(x) > and q < 1. The acceptance probability is regarded as vanishing in such a 
case in numerical calculations |], 0]. However, it is difficult to modify the present proof to adopt this 
convention used in numerical investigations. Theorem ||| anyway does not exclude the possibility that 
the present Markov chain is weakly ergodic for q < 1 or that it is strongly ergodic for arbitrary q. 

Remark 2. The condition for weak ergodicity given in Theorem |l| is similar to the condition of 
"infinite often in time" used to show convergence under the generalized generation probability in 
continuum space || |6| . 

Remark 3. Theorem |3| with the annealing schedule (|j) does not immediately mean a fast 
convergence of the expectation value of the cost function. We have proved only the convergence in 
the sense of weak ergodicity, not a fast convergence of the expectation value of the cost function. See 
section 6 for detailed discussions on this point. 



4 Case of q < 1 

It is instructive to investigate a simple solvable model with the parameter q < 1 because the general 
analysis in the previous section excluded this range of q for technical reasons. The one-dimensional 
model discussed by Shinomoto and Kabashima || is particularly suited for this purpose. 

They considered the thermal diffusion process of an object in a one-dimensional space. The object 
is located on one of the discrete positions x = ai, with i an integer, and is under the potential E{x) = 
x 2 /2. Hoppings to neighbouring positions i + 1 and i — 1 take place if thermal fluctuations allow the 
object to climb over the barriers with height B for the process i — > i — 1 and height B + Aj for i — > i + 
where Aj is the difference of the potentials at neighbouring locations Aj = E (a(i + 1)) — E(ai). By 
adaptively optimizing the temperature at each give time, they found that the energy (the expectation 
value of the potential at the position of the object) decreases as B/logt. The optimum annealing 
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schedule T opt (i) was shown to have this same asymptotic behaviour as a function of t. We show in 
the present section that the generalized transition probability with q = 1/2 leads to a much faster 
convergence of the energy. 

It should be noted that the analysis of the present section is not an application of the general 
theorem in the previous section. For example, q is less than 1 here, the number of possible states is 
not finite (i runs from — oo to oo), and the annealing schedule will turn out to be t , not t~ c . The 
purpose of the present section is to show the existence of a case, independently of Theorem ||, where 
the generalized transition probability yields much faster decrease of the temperature and energy. 

The problem is defined by the master equation describing the time evolution of the probability Pi 
that the object is at the ith position at time t: 

R _|_ AaVN / R\ 1/(1-9) 

l + fe-l)^^) Pi-(l + (Q-^) Pi- (36) 

It is straightforward to show that this master equation reduces to the following Fokker-Planck equation 
in the continuum limit a —* 

dP d d 2 P 

_ =7 (r)-(,P) +D (T)^ 

where 



— =^T)-{xP)+D{T)—^ (37) 



1 / R\ 9/(1-9) 

l(T) = + (38) 

/ B \ 1/(1-9) 

D(T) = {l+( q -l)-j . (39) 

We have rescaled the time unit by 1/a 2 as in |0]. 

Our aim is to find the fastest possible asymptotic decrease of the expectation value of the potential 
defined by 

y = { dxE(x)P(x,t) (40) 



by adaptively changing T as a function of time. Differentiating both sides of the definition ( [40[ ) and 
using the Fokker-Planck equation ([37]), we obtain the following equation describing the time evolution 
of y: 

^ = -2 7 (T)y + D(T). (41) 

The temperature is adaptively optimized by extremizing this right hand side with respect to T, yielding 

2yB + (1 - q)B 2 
2y + B 

(l-q)B + 2qy + 0(y 2 ). (42) 



^opt 



The evolution equation ( |4l| ) then has the asymptotic form 

dy__l I 2g yffl-«> 1/(1 _ a) 



The solution is 

y=Bq /(i- q )(^_j.y q t -a-^, (44) 
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The optimum annealing schedule (|42| ) is now 

T opt ~ (1 - q)B + const • H 1- ')/'. (45) 
The asymptotic behaviour of the average position can be calculated in the same way. The result is 

(x) = J dx xP(x,t) 

~ const • t~ 1/2q . (46) 

It is useful to restrict of the value of q to avoid unphysical behaviour of the generalized transition 
probability in the present one-dimensional problem. One of the transition probabilities in the master 
equation ( pq) 

B + A- i\ 1 /(H) 
l + te-l)^ 1 ^) (47) 



reduces for T = (1 — q)B + O(y) to 

/a 2 — 2ax 
[ 2B 



\ V(l-9) 

+ 0(y)) ■ (48) 



This quantity must be a small positive number for any x and sufficiently small (but fixed) a. This 
requirement is satisfied if 

q = l-^- (n = l,2,...). (49) 



Consistency of the other transition probabilities in ( |3q ) is also guaranteed under 

From (44) we see that the fastest decrease of the energy is achieved when q = 1/2. With this value 
of q, 

y ~ jt' 1 (50) 

„ B B i , 

To P t - 2 + l t (51) 
(x) ~ const -F 1 . (52) 

It may be useful to remark that the non-vanishing value (1 — q)B of the temperature (f45|) in the 
infinite-time limit does not cause troubles. What is required is not an asymptotically vanishing value 
of the temperature but that the probability distribution does not change with time in the infinite-time 



limit. This condition is satisfied if T = (1 — q)B as is apparent from (37) with ( |3q ) and (3£). 

The results (|50| ) and ( ^T[) show asymptotic relaxations proportional to i" 1 which is much faster 
than those for the conventional transition probability, B j log t ||. This result of course depends upon 
the specific structure of the one-dimensional model. We are not claiming to have shown that the 
generalized transition probability with q < 1 always gives faster decrease of the temperature and 
energy. 

5 More general transition probability 

A natural question may arise on how far the arguments in the previous sections depend on the specific 
form of the acceptance probability (||). We investigate this problem for the one-dimensional model 
treated in the preceding section. 

The master equation is now generalized to 

+ / (£±£=1) Pm _ / (£±^) Pi . i (|) Pi . m 
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The same Fokker-Planck equation ( |37|) is derived in the limit a — > with the following parameters 

7(T) = ~ (54) 



T 2 V T 

D(T) = /(f). (55) 
The expectation value of the potential obeys the same evolution equation as in (|IT|) : 

| = _^ +flm = ^(|) +/ (|) = £ (i). (56) 

Minimization of C(v)(v = 1/T) with respect to T for given y leads to 

2vyBf"(Bv) + (2y + = 0. (57) 

The solution of this equation for t> gives the optimal annealing schedule 

' g(y). (58) 



Assuming analyticity of g(y) as y — > 0, we write fl58|) as 

v = Cl + C2 y + 0(y 2 ). (59) 

It is required that the system stops its time evolution as y — > and w — > c\. We then have 
f(Bc\) = from (|56|) assuming c\ is finite. (This condition of c\ < oo is not satisfied by the 
conventional Boltzmann-type acceptance probability in which 1/v = T — > 0(ci — » oo) as y — > 0.) 
It is also necessary that the minimization condition (p7| ) is satisfied in the same limit, leading to 
f'(Bci) = 0. These two conditions on / and /' are satisfied if f(Bv)(= f{Bc\ + Bc2y)) and its 
derivative behave for small y as 

f(Bv) - c 3 y k , f'(Bv) ~ -C4/- 1 , (60) 

where A; > 1 and 03,04 > 0. The minus sign in front of C4 comes from the observation that an increase 
of the inverse temperature v = 1/T means a decrease of the energy y and therefore the differentiations 
by v and y should be done with the opposite sign (i.e. C2 < 0). 
The evolution equation (|56| ) then has a form 

!-- ** < 6i » 

with positive C5 if 2ciC4 > C3. This equation is solved as 

y = (c 5 (k-l)tr l ^ 1 \ (62) 

which shows a power decay of the expectation value of the cost function. 

It is useful to set a restriction on k as in the preceding section for q. The following acceptance 
probability for y — > should be positive for any x: 



where we have used fl60|) . This requirement is satisfied if A: is a positive even number k = In. The 
energy ( |62"| ) then decays as 

y-i- 1 ,*- 1 / 3 ,*- 1 / 5 ,---, (64) 

the same formula as in the preceding section. In fact the argument in section 4 is recovered if we 
choose 

f(v) = (l + (q-l)v) 1 ^K (65) 

In this way the fast decrease of the energy has been obtained for a very general acceptance prob- 
ability distribution function satisfying certain analyticity conditions. 
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6 Discussions 

We have proved weak ergodicity of the inhomogeneous Markov process generated by the generalized 
transition probability under certain conditions on the parameters. For technical reasons we were unable 
to prove strong ergodicity, or more strongly, convergence to the optimal distribution function. We 
could not show that the condition (|l4|) of Theorem || is satisfied by the present inhomogenous Markov 
chain. However, weak ergodicity alone already means that the state of the system asymptotically 
becomes independent of the initial condition, and it is most likely that such an asymptotic state is 
the optimal one as mentioned in section 1. 

It is appropriate to comment on computational complexity here. The time t% necessary for the 
temperature (|j) to reach a small specified value 5 is obtained by solving the relation bjt\ ~ 5 (c = 
(q-l)/R) farti: 

t\ ~ exp log - . (66) 



Here we have set R = k\N with N the system size because R defined in (p8|) is roughly of this order 



of magnitude in many cases. For example, in the problem of spin glasses, one can reach any spin 
configuration by flipping at most N spins. The corresponding time for the conventional simulated 
annealing is 

t 2 ~ exp — — (67) 



which has been obtained from A^iV/ log £2 ~ S. A comparison of (pq ) with ( p7|) reveals that the 
coefficient of N in the exponent has been reduced from 1/5 to log 1/5 by using the generalized transition 
probability. In this sense, ti <C ti. Since we have proved Theorem |3] under very general conditions 
on the system (which would include problems with NP completeness), it is not possible to find an 
algorithm to reach a low-temperature state in polynomial time. The best we could achieve is an 
improvement of the coefficient in the exponent. 

One should be careful that the rapid decrease of the temperature does not immediately mean 
a rapid decrease of the cost function. This aspect can be checked by comparing the acceptance 
probability (^) at T = 5 

MT = S)~(^ KE ) (68) 

with the corresponding one for the conventional transition 

u 2 (T = 6) ~ exp(-AE/6). (69) 

Since ui(5) 3> U2{5) if AE/5 1, we see that the generalized transition probability at a given 
temperature has a larger value to induce transitions into states with high values of the cost function 
than in the case of the conventional one at the same temperature. Thus the expectation value of the 
cost function may be larger under the generalized transition probability than under the conventional 
Boltzmann form at the same temperature if one waits sufficiently long until thermal equilibrium is 
reached. This phenomenon has actually been observed in a numerical investigation under a slightly 
different (but essentially similar) situation ||]. 

Therefore, if the expectation value of the cost function is observed in a numerical simulation to 
indeed decrease rapidly under the generalized transition probability, it would be not only for the 
rapid decrease of the temperature but also because the relaxation time is shorter. The conventional 
transition probability may give a larger possibility for the system to stay longer in local minima with 
high values of the cost function. A mathematical analysis of this property of quick relaxation by the 
generalized transition probability is beyond the scope of the present paper. However one may naively 
expect it to happen from the larger probability to climb over high barriers as discussed above. 
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It should be remarked that Theorem || with the annealing schedule (f|) does not give a practically 
useful prescription of simulated annealing. In actual numerical simulations one rarely uses such an- 
nealing schedules as (Q) obtained from worst-case estimates. Even exponentially fast decreases of the 
temperature often give satisfactory results in the conventional and generalized methods (see M and 
references in || ) . The significance of Theorem |3] is that convergence (in the sense of weak ergodicity) 
has anyhow been proved with the annealing schedule (||) under the generalized transition (acceptance) 
probability where only empirical numerical investigations have been carried out without mathematical 
guarantee of convergence under any annealing schedule. 

We thank Dr Naoki Kawashima, Dr Toshiyuki Tanaka, Dr Tsuyoshi Chawanya, and Prof Con- 
stantino Tsallis for discussions and comments. One of the authors (J. I.) acknowledges support from 
the Junior Research Associate Program of RIKEN. 
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